LCORE-1518: Skip model authentication against Azure by asimurka · Pull Request #1766 · lightspeed-core/lightspeed-stack

asimurka · 2026-05-19T08:04:50Z

Description

Optimizes Azure Entra ID token handling by skipping startup authentication for Azure models. This allows to deffer token acquisition to inference requests allowing to remove prestart token acquisition and passing by environment variables.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Cursor

Related Tickets & Documents

Related Issue #
Closes # LCORE-1518

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Refactor
- Azure authentication now obtains and supplies tokens at runtime via secure per-request headers; tokens are cached in memory rather than injected from environment files.
- Configuration enrichment disables Azure model validation automatically and no longer requires container .env files at startup.
Chores
- Startup/run process updated to generate and use an enriched configuration before launching the service.
Documentation
- Updated Azure Entra ID authentication guide and examples to reflect deferred auth and new config fields.

coderabbitai · 2026-05-19T08:05:06Z

Caution

Review failed

Failed to post review comments

Walkthrough

This PR moves Azure Entra ID token handling from startup-time .env generation to in-memory caching and per-request header injection. It adds client-holder methods to refresh and apply Azure provider data, enriches configs to disable Azure model validation at startup, updates endpoints/startup/scripts/tests, and revises docs/examples.

Changes

Azure Token Handling Refactoring: In-Memory Caching & Runtime Acquisition

Layer / File(s)	Summary
Token Manager In-Memory Storage `src/authorization/azure_token_manager.py`, `tests/unit/authorization/test_azure_token_manager.py`	`AzureEntraIDManager` now caches `_access_token` and `_azure_base_url` in memory, exposes `access_token`/`azure_base_url` properties, `set_base_url()` and `build_azure_provider_data()`. `_update_access_token()` no longer writes to environment variables; tests updated for in-memory semantics.
Configuration Enrichment with Model Validation `src/llama_stack_configuration.py`, `tests/unit/test_llama_stack_configuration.py`	Adds `enrich_azure_entra_id_inference()` to set `model_validation: false` for `remote::azure` when `azure_entra_id` is configured; removes token-generation and `--env-file` handling. `generate_configuration()` signature updated to drop `env_file`.
Client Holder Token Update & Azure Base URL Query `src/client.py`, `tests/unit/test_client.py`	Adds `AsyncLlamaStackClientHolder.update_azure_token()` (async) to obtain Azure provider data and rebuild or update the held client (library vs service modes). Adds `get_azure_base_url()` and integrates `enrich_azure_entra_id_inference()` into library-mode enrichment; old `update_provider_data()` removed. Tests cover header injection and deferral behavior.
Endpoint Handler Token Refresh Integration `src/app/endpoints/query.py`, `src/app/endpoints/responses.py`, `src/app/endpoints/streaming_query.py`, `src/app/endpoints/rlsapi_v1.py`, `src/utils/query.py`, `tests/unit/app/endpoints/*`, `tests/unit/utils/test_query.py`	Endpoint handlers now refresh Azure tokens by calling `AsyncLlamaStackClientHolder().update_azure_token()`; the standalone `update_azure_token()` helper was removed from `utils/query.py`. Endpoint tests were updated to mock the client holder method.
Startup & Runtime Integration `src/app/main.py`, `scripts/llama-stack-entrypoint.sh`, `tests/e2e-prow/.../llama-stack-openai.yaml`, `tests/e2e-prow/.../llama-stack-prow.yaml`	App lifespan no longer performs immediate startup token minting; it initializes the client, then configures `AzureEntraIDManager` and sets its base URL using `get_azure_base_url()`. Entrypoint scripts and manifests stop sourcing/passing a `.env` file during enrichment.
Documentation & Configuration Examples `docs/providers.md`, `docs/rag_guide.md`, `examples/azure-run.yaml`, `tests/e2e/configs/run-azure.yaml`	Docs updated to describe deferred inference-time Entra ID auth via `X-LlamaStack-Provider-Data`, in-memory token caching, and removal of env-file usage; example configs drop `api_key` and add `model_validation: false`.
Makefile Simplification `Makefile`	`run-llama-stack` target now generates an enriched config via `src/llama_stack_configuration.py` before running `uv run llama stack run` with the enriched config.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

tisnik

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Skip model authentication against Azure' directly and specifically describes the main objective of the changeset: deferring Azure Entra ID token acquisition from startup to inference time by setting model_validation to false for Azure providers.
Docstring Coverage	✅ Passed	Docstring coverage is 97.37% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/client.py (1)

257-263: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Harden provider-data JSON parsing before merge.

At Line 258, valid JSON that is not an object (for example, a list) will pass decoding and then fail at Line 262 on .update(...).

Proposed fix

-        try:
-            provider_data = json.loads(provider_data_json) if provider_data_json else {}
-        except (json.JSONDecodeError, TypeError):
-            provider_data = {}
+        try:
+            parsed = json.loads(provider_data_json) if provider_data_json else {}
+            provider_data = parsed if isinstance(parsed, dict) else {}
+        except (json.JSONDecodeError, TypeError):
+            provider_data = {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/client.py` around lines 257 - 263, The JSON parsing for provider data
currently assumes json.loads(provider_data_json) yields a dict and then calls
provider_data.update(updates); to fix, after decoding (and in the existing
except block) ensure the decoded value is a dict (use isinstance(provider_data,
dict)) and if it's not, replace it with an empty dict before calling
provider_data.update(updates); keep the existing exception handling for
JSONDecodeError/TypeError and apply this check where provider_data and
provider_data_json are used.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/providers.md`:
- Around line 95-107: The docs use two different config keys—`base_url` in the
prose and `api_base` in the YAML—causing inconsistency with the actual schema;
pick the correct schema key (whichever the code expects for the `remote::azure`
provider) and update both the descriptive text and the example so they match
(for example, replace `api_base` with `base_url` or vice versa) and keep
references to `azure_entra_id`, `model_validation: false`, `provider_id: azure`,
and `provider_type: remote::azure` unchanged.
- Around line 114-118: Update the lifecycle text under "Lightspeed startup
(library and service mode)" to remove the claim that Lightspeed "Acquires an
initial access token from Microsoft Entra ID" and "Caches the token in memory
(`SecretStr`, per worker)" at startup; instead state that Lightspeed reads Entra
ID configuration at startup but defers authentication until request time,
acquiring and caching access tokens on demand per-request (and still
initializing the Llama Stack client with `X-LlamaStack-Provider-Data` as
before). Ensure the revised step explicitly reflects the runtime-only,
deferred-auth flow implemented in the PR.

In `@src/client.py`:
- Around line 245-250: When initializing the AsyncLlamaStackAsLibraryClient
inside the token-refresh path, wrap the await client.initialize() call in a
try/except that catches initialization failures (broad Exception or the specific
initialization error type) and translate them into a service-unavailable error
(e.g., raise or return a ServiceUnavailableError / HTTP 503) instead of letting
the exception bubble; set self._lsc only after successful initialization (leave
self._lsc unchanged on failure) and include the client
variable/AsyncLlamaStackAsLibraryClient.initialize in your handling so callers
receive a 503-style backend-unavailable response when initialization fails.

In `@tests/unit/app/endpoints/test_streaming_query.py`:
- Line 731: The test creates an unused mock "mock_updated_client" and sets
update_azure_token to return it but never asserts it's used; either remove the
unused mock_updated_client and its return configuration to simplify the test, or
if you intend to verify the updated client is applied, change the assertions to
check that get_client() (or subsequent operations that use the client)
returns/uses mock_updated_client after calling update_azure_token; update
references to AsyncLlamaStackClient, mock_updated_client, and update_azure_token
accordingly.

---

Outside diff comments:
In `@src/client.py`:
- Around line 257-263: The JSON parsing for provider data currently assumes
json.loads(provider_data_json) yields a dict and then calls
provider_data.update(updates); to fix, after decoding (and in the existing
except block) ensure the decoded value is a dict (use isinstance(provider_data,
dict)) and if it's not, replace it with an empty dict before calling
provider_data.update(updates); keep the existing exception handling for
JSONDecodeError/TypeError and apply this check where provider_data and
provider_data_json are used.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 23accaba-033e-4848-bb78-61bb7f096894

📥 Commits

Reviewing files that changed from the base of the PR and between 559753d and 3324de6.

📒 Files selected for processing (22)

Makefile
docs/providers.md
docs/rag_guide.md
examples/azure-run.yaml
scripts/llama-stack-entrypoint.sh
src/app/endpoints/query.py
src/app/endpoints/responses.py
src/app/endpoints/streaming_query.py
src/app/main.py
src/authorization/azure_token_manager.py
src/client.py
src/llama_stack_configuration.py
src/utils/query.py
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml
tests/unit/app/endpoints/test_query.py
tests/unit/app/endpoints/test_responses.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/authorization/test_azure_token_manager.py
tests/unit/test_client.py
tests/unit/test_llama_stack_configuration.py
tests/unit/utils/test_query.py

💤 Files with no reviewable changes (4)

Makefile
tests/unit/utils/test_query.py
docs/rag_guide.md
src/utils/query.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🧰 Additional context used

📓 Path-based instructions (3)

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

tests/unit/app/endpoints/test_query.py
tests/unit/app/endpoints/test_responses.py
tests/unit/test_llama_stack_configuration.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/authorization/test_azure_token_manager.py
tests/unit/test_client.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/responses.py
src/app/main.py
src/authorization/azure_token_manager.py
src/llama_stack_configuration.py
src/client.py
src/app/endpoints/query.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/responses.py
src/app/main.py
src/app/endpoints/query.py

🧠 Learnings (4)

📚 Learning: 2026-05-12T15:14:34.788Z

Learnt from: syedriko
Repo: lightspeed-core/lightspeed-stack PR: 1727
File: scripts/konflux_requirements.sh:9-15
Timestamp: 2026-05-12T15:14:34.788Z
Learning: In this repo, the `.konflux/` directory is committed/tracked and is guaranteed to exist in a fresh clone. Therefore, shell scripts that write output under `.konflux/` (e.g., create files like `.konflux/<...>`) should not waste effort by calling `mkdir -p .konflux` first. Only add directory-creation logic if the script may run in an environment/repo state where `.konflux/` might not be present.

Applied to files:

scripts/llama-stack-entrypoint.sh

📚 Learning: 2026-02-19T10:06:50.647Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1181
File: tests/e2e-prow/rhoai/manifests/lightspeed/mock-jwks.yaml:32-34
Timestamp: 2026-02-19T10:06:50.647Z
Learning: In the rhoai tests under tests/e2e-prow/rhoai/manifests, avoid static ConfigMap definitions for mock-jwks-script and mcp-mock-server-script since these ConfigMaps are created dynamically by the pipeline.sh deployment script using 'oc create configmap'. Ensure there are no static ConfigMap resources for these names in the manifests. If such ConfigMaps are added in the future, coordinate with the pipeline to reflect dynamic creation or adjust tests to rely on the dynamic provisioning.

Applied to files:

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml

📚 Learning: 2026-04-06T20:18:07.852Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

src/app/endpoints/streaming_query.py
src/app/endpoints/responses.py
src/app/endpoints/query.py

📚 Learning: 2026-01-14T09:37:51.612Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.

Applied to files:

src/app/endpoints/query.py

🔇 Additional comments (17)

examples/azure-run.yaml (1)

27-27: LGTM!

src/app/main.py (1)

98-102: LGTM!

scripts/llama-stack-entrypoint.sh (1)

18-18: LGTM!

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml (1)

163-163: LGTM!

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml (1)

153-153: LGTM!

src/authorization/azure_token_manager.py (2)

36-42: LGTM!

Also applies to: 59-78

99-106: LGTM!

tests/unit/authorization/test_azure_token_manager.py (1)

56-61: LGTM!

Also applies to: 72-78, 80-92, 163-179

src/llama_stack_configuration.py (2)

46-78: LGTM!

582-583: LGTM!

Also applies to: 632-632

tests/unit/test_llama_stack_configuration.py (1)

15-15: LGTM!

Also applies to: 28-104, 527-527

tests/unit/test_client.py (1)

6-6: LGTM!

Also applies to: 13-13, 16-19, 102-178

src/app/endpoints/query.py (1)

199-207: LGTM!

src/app/endpoints/responses.py (1)

400-408: LGTM!

src/app/endpoints/streaming_query.py (1)

257-265: LGTM!

tests/unit/app/endpoints/test_query.py (1)

544-546: LGTM!

Also applies to: 568-568

tests/unit/app/endpoints/test_responses.py (1)

469-470: LGTM!

Also applies to: 485-485, 505-505

asimurka · 2026-05-19T08:54:28Z

-      api_key: ${env.AZURE_API_KEY}    # Must be exactly this - placeholder for Entra ID token
-      api_base: ${env.AZURE_API_BASE}
+      # api_key: ${env.AZURE_API_KEY}  # Can be omitted when Entra ID configured in LCORE
+      base_url: ${env.AZURE_API_BASE}


Change in attribute name

asimurka · 2026-05-19T08:55:10Z

-        # shellcheck source=/dev/null
-        set -a && . "$ENV_FILE" && set +a
-    fi
+        -o "$ENRICHED_CONFIG" 2>&1 || ENRICHMENT_FAILED=1


The API key is not passed by .env file anymore

asimurka · 2026-05-19T08:58:06Z

+    if azure_entra_id_config is not None:
+        AzureEntraIDManager().set_config(azure_entra_id_config)
+        azure_base_url = await AsyncLlamaStackClientHolder().get_azure_base_url()
+        AzureEntraIDManager().set_base_url(azure_base_url)


In FastAPI lifespan just setup manager's attributes and defer the token acquisition to inference requests.

asimurka · 2026-05-19T08:59:01Z

+        token = self.access_token.get_secret_value()
+        if not token or self.azure_base_url is None:
+            return None
+        return {"azure_api_key": token, "azure_api_base": self.azure_base_url}


Config and validator attribute discrepancy in LLS (api_base vs base_url)

asimurka · 2026-05-19T08:59:31Z

    return _is_inout_shield(shield) or not is_output_shield(shield)


-async def update_azure_token(


Replaced to client holder

are-ces

LGTM, just run e2e tests for Azure first TY

asimurka · 2026-05-21T10:46:00Z

Fixed 2 issues:

edited azure ci config: removed api key, added model_validation: false flag
added token refresh logic into rlsapi (the new logic revealed the fact that the logic was missing before)

CI running on my fork, see: https://github.com/asimurka/lightspeed-stack/actions/runs/26219639186/job/77150963063

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

Comment thread docs/providers.md

Comment thread docs/providers.md Outdated

Comment thread src/client.py

Comment thread tests/unit/app/endpoints/test_streaming_query.py

asimurka force-pushed the azure_skip_model_availability branch from 3324de6 to ba728c8 Compare May 19, 2026 08:16

asimurka commented May 19, 2026

View reviewed changes

asimurka requested review from are-ces and tisnik May 21, 2026 06:21

Skip model authentication against Azure

daf6dd3

asimurka force-pushed the azure_skip_model_availability branch from ba728c8 to daf6dd3 Compare May 21, 2026 06:45

are-ces approved these changes May 21, 2026

View reviewed changes

Fixed rlsapi token refresh + CI config

0c191b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCORE-1518: Skip model authentication against Azure#1766

LCORE-1518: Skip model authentication against Azure#1766
asimurka wants to merge 2 commits into
lightspeed-core:mainfrom
asimurka:azure_skip_model_availability

asimurka commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Review failed

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asimurka May 19, 2026

Uh oh!

asimurka May 19, 2026

Uh oh!

asimurka May 19, 2026

Uh oh!

asimurka May 19, 2026

Uh oh!

asimurka May 19, 2026

Uh oh!

are-ces left a comment •

edited

Loading

Uh oh!

asimurka commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return _is_inout_shield(shield) or not is_output_shield(shield)


		async def update_azure_token(

Conversation

asimurka commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asimurka May 19, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka May 19, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka May 19, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka May 19, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka May 19, 2026

Choose a reason for hiding this comment

Uh oh!

are-ces left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asimurka commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asimurka commented May 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading

are-ces left a comment •

edited

Loading